|
In computational learning theory, induction of regular languages refers to the task of learning a formal description (e.g. grammar) of a regular language from a given set of example strings. Although Mark E. Gold has shown that not every regular language can be learned this way (see language identification in the limit), approaches have been investigated for a variety of subclasses. They are sketched in this article. For learning of more general grammars, see Grammar induction. ==Example== A regular language is defined as a (finite or infinite) set of strings that can be described by one of the mathematical formalisms called "finite automaton", "regular grammar", or "regular expression", all of which have the same expressive power. Since the latter formalism leads to shortest notations, it shall be introduced and used here. Given a set Σ of symbols (a.k.a. alphabet), a regular expression can be any of * ∅ (denoting the empty set of strings), * ε (denoting the singleton set containing just the empty string), * ''a'' (where ''a'' is any character in Σ; denoting the singleton set just containing the single-character string ''a''), * ''r''+''s'' (where ''r'' and ''s'' are, in turn, simpler regular expressions; denoting their set's union) * ''r''⋅''s'' (denoting the set of all possible concatenations of strings from ''r'' 's and ''s'' 's set), * ''r''+ (denoting the set of ''n''-fold repetitions of strings from ''r'' 's set, for any ''n''≥1), or * ''r'' * (similarly denoting the set of ''n''-fold repetitions, but also including the empty string, seen as 0-fold repetition). For example, using Σ = , the regular expression (0+1+ε)⋅(0+1) denotes the set of all binary numbers with one or two digits (leading zero allowed), while 1⋅(0+1) *⋅0 denotes the (infinite) set of all even binary numbers (no leading zeroes). Given a set of strings (also called "positive examples"), the task of regular language induction is to come up with a regular expression that denotes a set containing all of them. As an example, given , a "natural" description could be the regular expression 1⋅0 *, corresponding to the informal characterization "''a 1 followed by arbitrarily many (maybe even none) 0es''". However, (0+1) * and 1+(1⋅0)+(1⋅0⋅0) is another regular expression, denoting the largest (assuming Σ=) and the smallest set containing the given strings, and called the trivial overgeneralization and undergeneralization, respectively. Some approaches work in an extended setting where also a set of "negative example" strings is given; then, a regular expression is to be found that generates all of the positive, but none of the negative examples. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Induction of regular languages」の詳細全文を読む スポンサード リンク
|